智能论文笔记

HyperPrompt: Prompt-based Task-Conditioning of Transformers

Yun He , Huaixiu Steven Zheng , Yi Tay , Jai Gupta , Yu Du , Vamsi Aribandi , Zhe Zhao , YaGuang Li , Zhao Chen , Donald Metzler

分类：自然语言处理 | 机器学习

2022-03-01

及时调整是以参数有效的方式对预训练的预训练语言模型的新范式。在这里，我们探讨了超级核武器的使用来产生超预价：我们提出了HyperPrompt，这是一种用于迅速基于变形金刚自我注意的任务调节的新型体系结构。超预要是通过超网络通过一代人来学习的端到端。 HyperPrompt允许网络学习特定于任务的功能地图，其中超预告是要参与的查询的任务全局记忆，同时启用了任务之间的灵活信息共享。我们表明，HyperPrompt与强大的多任务学习基线具有竞争力，其额外的任务条件参数的$ 0.14 \％$ $ \％，实现了出色的参数和计算效率。通过广泛的经验实验，我们证明，超级启示可以比强大的T5多任务学习基准和参数效率高效的适配器变体获得卓越的性能，包括及时调整和SuplyFormer ++在许多模型尺寸的自然语言理解胶水和SuperGrue的基准上。

translated by 谷歌翻译

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

Vamsi Aribandi , Yi Tay , Tal Schuster , Jinfeng Rao , Huaixiu Steven Zheng , Sanket Vaibhav Mehta , Honglei Zhuang , Vinh Q. Tran , Dara Bahri , Jianmo Ni

分类：自然语言处理 | 机器学习

2021-11-22

尽管最近的多任务学习和自然语言处理的转移学习成功（NLP），但很少有效地研究了在训练中缩放任务数量的效果。迈出了这一目标，介绍了Exmix（极端混合物）：跨越各个领域和任务家庭的大规模收集107个监督的NLP任务。使用EXMIX，我们研究了最大规模的多任务预培训的影响，并分析了普通任务家庭之间的共同培训转移。通过此分析，我们表明手动策划用于多任务预训练的理想任务，并不简单，而且多任务缩放可以自行改进模型。最后，我们提出了Ext5：使用自我监督跨度去噪和监督EXMIX的多任务目标预先训练的模型。通过广泛的实验，我们表明Ext5优于超级格，宝石，彩虹，封闭书QA任务的强大T5基线，以及Exmix之外的几个任务。 Ext5在预训练时也显着提高了样品效率。

translated by 谷歌翻译

Fast Learning of Multidimensional Hawkes Processes via Frank-Wolfe

Renbo Zhao , Niccolò Dalmasso , Mohsen Ghassemi , Vamsi K. Potluru , Tucker Balch , Manuela Veloso

分类：机器学习

2022-12-12

Hawkes processes have recently risen to the forefront of tools when it comes to modeling and generating sequential events data. Multidimensional Hawkes processes model both the self and cross-excitation between different types of events and have been applied successfully in various domain such as finance, epidemiology and personalized recommendations, among others. In this work we present an adaptation of the Frank-Wolfe algorithm for learning multidimensional Hawkes processes. Experimental results show that our approach has better or on par accuracy in terms of parameter estimation than other first order methods, while enjoying a significantly faster runtime.

translated by 谷歌翻译

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

Kuan-Lin Chen , Daniel D. E. Wong , Ke Tan , Buye Xu , Anurag Kumar , Vamsi Krishna Ithapu

分类：机器学习

2022-11-16

Most speech enhancement (SE) models learn a point estimate, and do not make use of uncertainty estimation in the learning process. In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin. Due to unrestricted heteroscedastic uncertainty, the covariance introduces an undersampling effect, detrimental to SE performance. To mitigate undersampling, our approach inflates the uncertainty lower bound and weights each loss component with their uncertainty, effectively compensating severely undersampled components with more penalties. Our multivariate setting reveals common covariance assumptions such as scalar and diagonal matrices. By weakening these assumptions, we show that the NLL achieves superior performance compared to popular losses including the mean squared error (MSE), mean absolute error (MAE), and scale-invariant signal-to-distortion ratio (SI-SDR).

translated by 谷歌翻译

Towards Improved Room Impulse Response Estimation for Speech Recognition

Anton Ratnarajah , Ishwarya Ananthabhotla , Vamsi Krishna Ithapu , Pablo Hoffmann , Dinesh Manocha , Paul Calamia

分类：人工智能

2022-11-08

We propose to characterize and improve the performance of blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a GAN-based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 72% on the energy decay relief and 22% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9% in word error rate).

translated by 谷歌翻译

Online Learning for Mixture of Multivariate Hawkes Processes

Mohsen Ghassemi , Niccolò Dalmasso , Simran Lamba , Vamsi K. Potluru , Sameena Shah , Tucker Balch , Manuela Veloso

分类： (统计)机器学习 | 机器学习

2022-08-16

在过去的几年中，霍克斯流程的在线学习受到了越来越多的关注，尤其是用于建模演员网络。但是，这些作品通常会模拟事件或参与者的潜在群集之间的丰富相互作用，或者是参与者之间的网络结构。我们建议对参与者网络的潜在结构进行建模，以及在现实世界中的医疗和财务应用环境中进行的丰富互动。合成和现实世界数据的实验结果展示了我们方法的功效。

translated by 谷歌翻译

Differentially Private Learning of Hawkes Processes

Mohsen Ghassemi , Eleonora Kreačić , Niccolò Dalmasso , Vamsi K. Potluru , Tucker Balch , Manuela Veloso

分类： (统计)机器学习 | 机器学习

2022-07-27

Hawkes流程最近从机器学习社区中引起了人们对建模事件序列数据的多功能性的越来越多的关注。尽管它们具有丰富的历史可以追溯到几十年前，但其某些属性（例如用于学习参数的样本复杂性和释放差异化私有版本的样本复杂性）尚未得到彻底的分析。在这项工作中，我们研究了具有背景强度$ \ mu $和激发功能$ \ alpha e^{ - \ beta t} $的标准霍克斯进程。我们提供$ \ mu $和$ \ alpha $的非私人和差异私人估计器，并在两种设置中获得样本复杂性结果以量化隐私成本。我们的分析利用了霍克斯过程的强大混合特性和经典的中央限制定理的结果，结果较弱的随机变量。我们在合成数据集和真实数据集上验证了我们的理论发现。

translated by 谷歌翻译

RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing

Efthymios Tzinis , Yossi Adi , Vamsi Krishna Ithapu , Buye Xu , Paris Smaragdis , Anurag Kumar

分类：机器学习

2022-02-17

我们提出混音，这是一种简单而有效的自我监督方法，用于训练语音增强，而无需单个孤立的内域语音或噪声波形。我们的方法克服了以前的方法的局限性，这些方法使它们取决于清洁内域目标信号，因此，对火车和测试样品之间的任何域不匹配敏感。混音基于连续的自我训练方案，在该方案中，预先训练的教师模型涉及域外数据渗透者估计的伪靶信号，用于构域混合物。然后，通过将估计的清洁和噪声信号置换并将它们重新混合在一起，我们生成了一组新的自举混合物和相应的假目标，用于训练学生网络。反之亦然，教师使用最新学生模型的更新参数定期完善其估计。多个语音增强数据集和任务的实验结果不仅显示了我们方法比先前方法的优越性，而且还展示了混音可以与任何分离模型结合在一起，还可以应用于任何半监督和无监督的域适应任务。我们的分析与经验证据相结合，阐明了我们的自我训练方案的内部功能，其中学生模型在观察严重降级的伪靶标的情况下不断获得更好的性能。

translated by 谷歌翻译

Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization

Hao Jiang , Calvin Murdock , Vamsi Krishna Ithapu

分类：计算机视觉

2022-01-06

增强现实设备具有增强人类感知的潜力，并使复杂的会话环境中的其他辅助功能能够实现。有效地捕获理解这些社交交互所必需的视听上下文首先需要检测和定位设备佩戴者和周围人的语音活动。这些任务由于它们的高电平性质而挑战：佩戴者的头部运动可能导致运动模糊，周围的人可能出现在困难的观察中，并且可能有遮挡，视觉杂乱，音频噪声和畸形。在这些条件下，以前的最先进的主动扬声器检测方法不会给出令人满意的结果。相反，我们使用视频和多通道麦克风阵列音频从新设置中解决问题。我们提出了一种新的端到端深度学习方法，可以提供强大的语音活动检测和本地化结果。与以前的方法相比，我们的方法将主动扬声器从球体上的所有可能方向定位，即使在相机的视野之外，同时检测设备佩戴者自己的语音活动。我们的实验表明，该方法提供了卓越的结果，可以实时运行，并且对抗噪音和杂乱是强大的。

translated by 谷歌翻译